Skip to content

Conversation

@FelixMcFelix
Copy link
Collaborator

@FelixMcFelix FelixMcFelix commented Sep 4, 2025

XXX currently sketching out the bare minimum requirements around frame delivery, insertion of Geneve options, etc.

TODO:

  • Add XDE-wide multicast forwarding table (Ipv6Addr -> BTreeMap<(NextHopV6, Replication)>).
  • Add exceptions in source/dest MAC/L3 addr checking for multicast addreses matching known groups. (#cidr-checking only mcast dst side)
  • Populate Mcast2Phys and above table via ioctl.
  • Some tests might be nice? :)

FelixMcFelix and others added 5 commits September 3, 2025 16:26
Also pushes on the requisite extensions for us to fill in
This implements IPv4 and IPv6 multicast packet forwarding with three
replication modes (External, Underlay, All) for rack-wide multicast
delivery across VPCs.

Includes:
  - M2P (Multicast-to-Physical) mappings with admin-scoped IPv6 underlay
  - Per-port multicast group subscriptions for local delivery
  - Multicast forwarding table with configurable replication strategies
  - Geneve multicast option encoding for delivery mode signaling
  - RX path loop prevention (packets marked Underlay skip re-relay)
  - TX/RX path integration with flow table and encapsulation
  - DTrace probes for multicast delivery observability
  - API addition: set_mcast_fwd/clear_mcast_fwd for forwarding table management
  - API addition: mcast_subscribe/mcast_unsubscribe for port group membership
  - API addition: dump_mcast_fwd for observability
  - Testing: XDE integration tests covering all replication modes, validation,
    and edge cases
  - Testing: oxide-vpc integration tests for Geneve encapsulation and parsing
  - Enforce DEFAULT_MULTICAST_VNI (77) for all multicast traffic (groups
    are fleet-side/cross-VPC) and validate admin-scoped underlay
    addresses (ff04::/16, ff05::/16, ff08::/16).
@zeeshanlakhani zeeshanlakhani self-assigned this Oct 20, 2025
Copy link
Collaborator Author

@FelixMcFelix FelixMcFelix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work so far. I haven't looked at the new multicast integration tests yet, but I have a bunch of questions from what I've looked at thus far.

Updates all-around for IPv4/IPv6 multicast support with control-plane APIs,
kernel TX/RX implementation, dtrace script, and documentation semantics.

Includes:
  - Delivery semantics (leaf-node):
    - Remove multicast relay logic; OPTE is always a leaf node in the
      replication tree
    - Same-sled delivery happens unconditionally on TX for local subscribers
    - RX-path only handles packets destined to this sled (no forwarding)

  - Perf (avoid management_lock in datapath):
    - Move mcast_fwd lookups to per-entry state instead of hitting
      exclusive management lock during TX replication
    - Clone Arc references from per-CPU caches instead of holding per-port
      RwLock guards across packet processing
    - Use state.devs.read() for concurrent dataplane access
    - Hold per-CPU copies of mcast_fwd for duration of TX replication

  - Arbitrary VNI handling:
    - Use DEFAULT_MULTICAST_VNI (77) for fleet-wide multicast delivery
    - Remove per-VPC VNI checks in xde.rs; delegate validation to overlay layer
    - Packets with VNI 77 delivered to all subscribers regardless of VPC

  - Replication flag clarification:
    - Replication enum specifies switch behavior on marked packets:
      - External: Switch replicates to front panel ports (leaving underlay)
      - Underlay: Switch replicates to sleds (within underlay)
      - Both: Switch does both replications
    - Used only on TX-path to inform switch behavior, not for RX-path

  - Routing and MACs:
    - Now, we set the right nexthop and routing for TX replication (the
      switch unicast address)
    - Use derived IPv6 multicast MAC for outer destination
    - Route lookup determines underlay port selection via next_hop
    - Simplified underlay routing for admin-scoped (ff04::/16)
      addresses, matching Omicron currently

  - Test infra:
    - MulticastGroup: RAII cleanup for M2P/forwarding entries
    - SnoopGuard: Prevent leaked snoop processes from holding DLPI devices
    - Geneve packet verification with replication flag validation
    - three_node_topology for multi-subscriber scenarios
    - Proactive zone cleanup
    - Standardized around updated semantics

  - Additional refinements:
    - Updated DTrace script (opte-mcast-delivery.d)
    - Improved opteadm output formatting for multicast commands
    - Added anyhow dependency to opte-test-utils
    - Updated documentation clarifying multicast architecture
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants